1 Motivation

Gully erosion is an indicator of land degradation, which occurs due to the removal of soil along drainage channels through surface water runoff. Given the significant impact of gullies, it is essential to develop reliable and targeted analysis methods to better understand their spatial occurrence and enable improved representation of these erosion processes in catchment sediment budget models.

The aims of the study were:

2 Methodology

2.1 Study areas

Mangatu catchment for training and testing; Wairoa for model validation (Fig. 2.1).

Study areas for analysis. The Mangatu catchment was used for model training and testing and subsets of the Wairoa catchment for validation

Figure 2.1: Study areas for analysis. The Mangatu catchment was used for model training and testing and subsets of the Wairoa catchment for validation

2.2 Data

2.2.1 Reference data

The reference data consists of a combination of all gully features for 1939, 1957, 1960, 1970, 1988, 1997 from Marden et al. (2012) and Marden, Herzig, and Basher (2014) (Fig. 2.2). These data was manually delineated based on historical aerial photography.

Gully reference data for the Mangatu catchment. From @Marden2012 and @Marden2014. Background: Aerial imagery collected between 2012 and 2013.

Figure 2.2: Gully reference data for the Mangatu catchment. From Marden et al. (2012) and Marden, Herzig, and Basher (2014). Background: Aerial imagery collected between 2012 and 2013.

The original data delivered by MWLR had a problem with co-registration, where polygons were not matching their correct geographical locations. To correct this, we used an affine transformation, see more details here.

The data was split into training and testing with a 70:30 ratio, stratified by year. To use it as reference data, the polygons were merged and dissolved.

library(sf)
library(tidyverse)
library(here)
g = read_sf(here("data/reference/gullies.shp"))
set.seed(7319)
g = g %>% mutate(id = row_number())
train = g %>% 
  group_by(year) %>% 
  slice_sample(prop = 0.7) %>% 
  ungroup() 
test = g %>% 
  as_tibble() %>% 
  anti_join(train, by = 'id') %>% 
  st_as_sf() %>% 
  select(-id)
train = train %>% 
  select(-id)

write_sf(
  train,
  dsn = here("data/reference/gullies_train.shp"),
  delete_layer = TRUE
)
write_sf(
  test,
  dsn = here("data/reference/gullies_test.shp"),
  delete_layer = TRUE
)

train %>% 
  st_union() %>% 
  st_cast("POLYGON") %>% 
  st_sf(class = 1) %>% 
  write_sf(
    dsn = here("data/reference/gullies_train_union.shp"),
    delete_layer = TRUE
  )

test %>% 
  st_union() %>% 
  st_cast("POLYGON") %>% 
  st_sf(class = 1) %>% 
  write_sf(
    dsn = here("data/reference/gullies_test_union.shp"),
    delete_layer = TRUE
  )

2.2.2 LiDAR DEM data

The LiDAR DEM data corresponds to the year 2019 with a spatial resolution of 1 m. We used SAGA to calculate 12 terrain derivatives based on the LiDAR DEM, which included:

  1. Slope
  2. Aspect
  3. Profile curvature
  4. Plan curvature
  5. Hillshade
  6. Slope length and steepness (LS) Factor
  7. Mass Balance Index
  8. Stream Power Index
  9. Texture
  10. Terrain ruggedness index
  11. Terrain wetness index
  12. Vertical distance to channel network

2.3 Deep Learning Model

We tested a region-based convolutional neural network (Mask-RCNN) deep learning approach for object detection to map gully features. Mask-RCNN is a model for image segmentation, developed on top of Faster R-CNN (He et al. 2017). See more information on how Mask-RCNN works here.

The deep learning was performed woth a RGB combination of the 12 terrain derivatives. Latest tests were acceptable with: ‘lsfct,’‘shade,’‘tridx.’ Labelled chips for training data were generated with the reference gully features.

The main workflow took place in ArcGIS Pro 2.9.2. The workflow in figure 2.3 is recommended. The operating system used was Microsoft Windows 10 Enterprise, with processor characteristics Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz, 6 cores and 12 logical processors, 64GB of installed RAM and NVIDIA GeForce GTX 1070 GPU with 8GB of dedicated memory and 6.1 compute capability.

Figure 2.3: Deep Learning workflow for object detection in ArcGIS Pro

These steps can be applied using the GUI or with the ArcPy Python modules for ArcGIS.

Using the ArcPy and learn modules, we established the workflow below within a Jupyter notebook and a ArcGIS Pro Project.

See documentation of the Python API here.

  1. Pre-process data:
# Import system modules and check out ArcGIS Image Analyst extension license
import arcpy
arcpy.CheckOutExtension("ImageAnalyst")
from arcpy.ia import *
from arcpy import env
data_dir = r'data'
env.workspace = data_dir

# Composite bands
arcpy.CompositeBands_management(
    "mangatu_cplan.tif;mangatu_cprof.tif;mangatu_ddgrd.tif;mangatu_lsfct.tif;mangatu_mbidx.tif;mangatu_shade.tif;mangatu_slope.tif;mangatu_spidx.tif;mangatu_textu.tif;mangatu_tridx.tif;mangatu_twidx.tif;mangatu_vdcnw.tif",
    "terrain.tif"
)

# Rename bands
terrain = r'data\terrain.tif'
terrain_raster = Raster(terrain, True)

terrain_raster.renameBand(1, 'cplan')
terrain_raster.renameBand(2, 'cprof')
terrain_raster.renameBand(3, 'ddgrd')
terrain_raster.renameBand(4, 'lsfct')
terrain_raster.renameBand(5, 'mbidx')
terrain_raster.renameBand(6, 'shade')
terrain_raster.renameBand(7, 'slope')
terrain_raster.renameBand(8, 'spidx')
terrain_raster.renameBand(9, 'textu')
terrain_raster.renameBand(10, 'tridx')
terrain_raster.renameBand(11, 'twidx')
terrain_raster.renameBand(12, 'vdcnw')
terrain_raster.getRasterBands()
  1. Export training data:
# Define variables
terrain = r'data\terrain.tif'
terrain_raster = Raster(terrain, True)
gullies = r'data\gullies_train.shp'
out_chips = r'work\terrain_3lay'

# Select bands
terrain_sel = ExtractBand(terrain_raster, band_names=['lsfct','shade','tridx'])

# Execute ArcPy tool for data export
ExportTrainingDataForDeepLearning(
    terrain_sel, out_chips, gullies, 
    "TIFF", 1024, 1024, 512, 512,
    "ONLY_TILES_WITH_FEATURES", "RCNN_Masks", 0, "",
    0, None, 0, "MAP_SPACE", "PROCESS_AS_MOSAICKED_IMAGE",
    "NO_BLACKEN", "FIXED_SIZE"
)

# Prepare data 
from arcgis.learn import prepare_data
data = prepare_data(
    out_chips, 
    batch_size=4, chip_size=512,
    imagery_type='ms' #multispectral
)
data.show_batch(rgb_bands=[0,1,2])
  1. Model training:
# Define model architecture
from arcgis.learn import MaskRCNN
maskrcnn = MaskRCNN(
    data, backbone='resnet50', pointrend=True
)

# Find efficient learning rate
maskrcnn.lr_find() # result goes into next command

# Fit the model
maskrcnn.fit(epochs=20, lr=maskrcnn.lr_find())

# Unfreeze the model 
maskrcnn.unfreeze()

# Plot losses
maskrcnn.plot_losses()

# Show results
maskrcnn.show_results(rows=3)

# Save model
maskrcnn.save('gully_model_lsfctshadetridx_20ep')
  1. Detect objects:
arcpy.env.processorType = "GPU"
# Set local variables
terrain_sel = ExtractBand(terrain, band_names=['cplan','lsfct','slope','spidx','tridx','twidx'])
in_raster = terrain_sel
out_detected_objects = "work/terrain_3lay/results/gully_lsfctshadetridx_20ep.shp"
in_model_definition = "work/terrain_3lay/models/gully_model_lsfctshadetridx_20ep/gully_model_lsfctshadetridx_20ep.emd"
model_arguments = "padding 256;threshold 0.5;batch_size 4"
run_nms = "NO_NMS"
confidence_score_field = "Confidence"
class_value_field = "Class"
max_overlap_ratio = 0
processing_mode = "PROCESS_AS_MOSAICKED_IMAGE"

# Check out the ArcGIS Image Analyst extension license
arcpy.CheckOutExtension("ImageAnalyst")

# Execute 
DetectObjectsUsingDeepLearning( in_raster, out_detected_objects, 
   in_model_definition, model_arguments, run_nms, confidence_score_field, 
   class_value_field, max_overlap_ratio, processing_mode)

For this step we tested several padding sizes (256, 128, 64, 32). Every run of the tool resulted in slightly different results.

2.4 Validation

To validate the deep learning model, gully features from the Wairoa catchment, northern Hawke’s Bay would be ideally used. This data is from the SedNetNZ project, where erosion features including gullies, earthflows and cliffs were mapped on a 5x5 km grid. The features were mapped based on aerial imagery. For the Northern Wairoa catchment, the Gisborne 2017-19 regional aerial imagery with 0.3 m spatial resolution and for the Southern Wairoa catchment, the HBRC 2019-20 regional aerial imagery with 0.3 m spatial resolution were used.

The validation would take place by selecting the mapped gully features from the dataset, and using them as a base for comparison with the results of the deep learning model applied for this area. Measures such as Intersection over Union (IoU) and accuracy can be then computed to evaluate the performance of the model using an unbiased and completely new dataset.

2.4.1 Current status

For now, work has been delayed due to the delivery of LiDAR for the Hawke’s Bay region (expected now in July 2022). However, some other LiDAR tiles were available covering two grid cells (62 and 119) from the erosion feature mapping. An initial exploration of these tiles are shown in figure 2.4.

Click to expand!

Selected examples within the Wairoa catchment for erosion features. Grids 62 and 119.Selected examples within the Wairoa catchment for erosion features. Grids 62 and 119.Selected examples within the Wairoa catchment for erosion features. Grids 62 and 119.Selected examples within the Wairoa catchment for erosion features. Grids 62 and 119.Selected examples within the Wairoa catchment for erosion features. Grids 62 and 119.Selected examples within the Wairoa catchment for erosion features. Grids 62 and 119.Selected examples within the Wairoa catchment for erosion features. Grids 62 and 119.Selected examples within the Wairoa catchment for erosion features. Grids 62 and 119.

Figure 2.4: Selected examples within the Wairoa catchment for erosion features. Grids 62 and 119.

From this initial exploration, several points regarding the data emerged:

  1. Data completeness: There are several locations within these two grids that at the (non-expert) naked-eye, would look like erosion features however they are not mapped. The layers were called “initial mapping” so the question on whether an updated version of this mapping is available arises.

  2. Feature characteristics: Some of the features mapped as gullies and cliffs are very similar to each other. We wonder if there was any criteria or rules followed to decide to classify the features among the presented categories.

  3. Feature delineation: It might be that the background data we have used to look into these features is not the same as the one used for mapping them. However, we see some discrepancies between what can be observed and the actual mapped feature. Hence, we wonder how accurate this delineation really is.

Further grid cells could be explored once the new LiDAR data is there but we believe that these issues should be clarified early enough before using this new area as a validation site. If the data is incomplete, the delineations are not accurate, and the features happen to be mixed between cliffs and gullies, then we think that 1) the deep learning model will have a hard time differentiating the gully features, and 2) any validation measures computed with these data reference data will be biased, probably presenting several “false positives” that were actually correctly mapped features. To be able to do a good validation of the model, we require an area with a 100% accurate and complete mapping of the features.

3 Results

## Visualize results from ArcGIS deep learning workflow:
library(sf)
library(stars)
library(tmap)
library(here)
library(tidyverse)
library(patchwork)

ref_dir = here("data/reference/")
output_dir = here("deep_learning/DetectedObjects_update/output/")
output_fs = list.files(
  output_dir,
  pattern = "^lsfctshadetridx_train.*shp$", 
  full.names = TRUE
)
output_name = list.files(
  output_dir,
  pattern = "^lsfctshadetridx_train.*shp$", 
  full.names = FALSE
) %>% str_extract("pad[0-9]+") %>% 
  str_extract("[0-9]+")

read_id = function(x, name) {
  read_sf(x) %>% 
    mutate(pad = as.numeric(name))
}
detectedObjects = map2(output_fs, output_name, read_id) %>% 
  bind_rows() %>% 
  arrange(Confidence)

train = here(ref_dir, "gullies_train_union.shp") %>% 
  read_sf() %>% 
  st_transform(st_crs(detectedObjects))
test = here(ref_dir, "gullies_test_union.shp") %>% 
  read_sf() %>% 
  st_transform(st_crs(detectedObjects))

detectedObjects = detectedObjects %>% 
  mutate(area = as.numeric(st_area(geometry)))

detectedObjectsFiltered = detectedObjects %>% 
  filter(area > 2000 | Confidence > 80)

padcount = ggplot(detectedObjectsFiltered) +
  geom_histogram(aes(Confidence)) +
  facet_wrap(~pad) +
  labs(y = "Count", x = "Confidence (%)")

padarea = ggplot(detectedObjectsFiltered) +
  aes(Confidence, area) +
  geom_point() +
  geom_smooth(method = "loess") +
  scale_y_continuous(trans = "log10") +
  facet_wrap(~pad) +
  labs(y = "Area (m2)", x = "Confidence (%)")

ggsave(padcount, filename = 'pad_count_conf.png',
       width = 20, height = 18, units = 'cm', dpi = 300)
ggsave(padarea, filename = 'pad_area_conf.png',
       width = 20, height = 18, units = 'cm', dpi = 300)

padconfmap = tm_shape(train) +
  tm_fill(col = "grey70") +
  tm_shape(detectedObjectsFiltered) +
  tm_polygons(
    col = "Confidence", palette = "viridis",
    legend.hist = TRUE
  ) +
  tm_facets(by = "pad", ncol = 2) +
  tm_layout(
    legend.hist.width = 1,
    legend.outside.position = "right",
    legend.outside.size = 0.25
  )

tmap_save(padconfmap, filename = 'pad_conf_map.png',
       width = 20, height = 17, units = 'cm', dpi = 300)

dist = ggplot(detectedObjectsFiltered) +
  geom_sf(aes(fill = Confidence), col = NA) +
  scale_fill_viridis_c() +
  theme_void() +
  theme(panel.background = element_rect(fill = "black")) 

ref = ggplot() +
  geom_sf(data = train, col = 'red', fill = NA) +
  geom_sf(data = test, col = 'orange', fill = NA) +
  theme_void() 
dims = get_dim(dist)
ref_aligned = set_dim(ref, dims)

ggsave(dist, filename = 'distribution.png',
       width = 20, height = 18, units = 'cm', dpi = 300)
ggsave(ref_aligned, filename = 'reference.png',
       width = 20, height = 18, units = 'cm', dpi = 300)

# ggplot(detectedObjectsFiltered) +
#   geom_violin(aes(y = as.numeric(area)*0.0001, x = as.factor(pad))) +
#   labs(y = "Area (ha)", x = "Padding size", title = "Area of detected objects by padding size")

The reference gully features were split into training and testing (Fig. 3.1). The deep learning model trained for gully detection was applied to the same Mangatu area to detect gully features.

Reference gully features split into training (red) and testing (orange).

Figure 3.1: Reference gully features split into training (red) and testing (orange).

Different padding sizes were applied during object detection. We filtered the results to those detected gully features with an area larger than 0.2 ha or with a detection confidence higher than 80%.

When looking at the padding sizes individually we can see that the padding number does not seem to result in higher confidence (Fig. 3.2). The same for the feature area, larger features do not seem to have a larger confidence, and their behavior is equal among image padding.

Detection confidence feature count facetted by padding size.

Figure 3.2: Detection confidence feature count facetted by padding size.

Detection confidence feature area facetted by padding size.

Figure 3.3: Detection confidence feature area facetted by padding size.

A final look at the results per padding and their geographical distribution (Fig. 3.4).

Gully features detection confidence facetted by padding size.

Figure 3.4: Gully features detection confidence facetted by padding size.

Given this homogeneity among the detected features, the results were combined as shown in Fig. 3.5.

Resulting gully features combined from all padding sizes (32, 64, 128, 256) and filled by their detection confidence.

Figure 3.5: Resulting gully features combined from all padding sizes (32, 64, 128, 256) and filled by their detection confidence.

3.1 Accuracy assessment

Using the test dataset, we can examine how many of the detected features overlap with the reference gullies that were not used for training (Fig. 3.6. A quick analysis of the intersecting area between the test features and detected gullies can be observed in (Fig. 3.7.

# detectedObjectsFilteredInt = detectedObjectsFiltered %>% 
#   rowwise() %>% 
#   mutate(
#     areaInt = st_intersection(geometry, test$geometry) %>%
#       st_area() %>% sum()
#   )
# 
# save(detectedObjectsFilteredInt, file = "backup_validation_gullies.RData")
load(file = "backup_validation_gullies.RData")
detectedObjectsFilteredInt

testoverlap = tm_shape(test) +
  tm_borders(col = "orange") +
  tm_shape(detectedObjectsFilteredInt) +
  tm_polygons(
    col = "Confidence", palette = "viridis",
    border.alpha = 0, alpha = 0.8
  ) +
  tm_layout(
    legend.outside = TRUE,
    legend.outside.position = "right",
    legend.outside.size = 0.25
  )

tmap_save(testoverlap, filename = 'acc_test_overlap.png',
       width = 20, height = 17, units = 'cm', dpi = 300)

t = detectedObjectsFilteredInt %>% 
  mutate(intPerc = as.numeric(areaInt)/area*100)

tfilt = t %>% 
  filter(intPerc > 0) 
  
testint = tm_shape(test) +
  tm_borders(col = "grey30") +
  tm_shape(tfilt) +
  tm_polygons(
    col = "intPerc", palette = "viridis",
    breaks = c(0, 25, 50, 75, 400),
    border.alpha = 0, alpha = 0.8,
    legend.hist = TRUE
  ) +
  tm_layout(
    title = "Intersection percentage (%)",
    legend.hist.width = 1,
    legend.outside = TRUE,
    legend.outside.position = "right",
    legend.outside.size = 0.25
  )
tmap_save(testint, filename = 'acc_test_int.png',
       width = 20, height = 17, units = 'cm', dpi = 300)
Resulting gully features combined overlapping gully reference in test dataset

Figure 3.6: Resulting gully features combined overlapping gully reference in test dataset

Resulting gully features and the intersecting area to gully reference in test dataset

Figure 3.7: Resulting gully features and the intersecting area to gully reference in test dataset

4 Conclusions

The data-driven deep learning framework shows promising results regarding gully presence and absence. Validation resulted in detected gullies overlapping 60% of the reference gully area. However, a limiting factor related to the available reference data that was mapped on historical aerial photography and does not align with the LiDAR DEM.

References

He, Kaiming, Georgia Gkioxari, Piotr Dollar, and Ross Girshick. 2017. Mask R-CNN.” In 2017 IEEE International Conference on Computer Vision (ICCV), 2017-Octob:2980–88. IEEE. https://doi.org/10.1109/ICCV.2017.322.
Marden, Michael, Gregory Arnold, Anne Seymour, and Randolph Hambling. 2012. History and distribution of steepland gullies in response to land use change, East Coast Region, North Island, New Zealand.” Geomorphology 153-154: 81–90. https://doi.org/10.1016/j.geomorph.2012.02.011.
Marden, Michael, Alexander Herzig, and Les Basher. 2014. Erosion process contribution to sediment yield before and after the establishment of exotic forest: Waipaoa catchment, New Zealand.” Geomorphology 226: 162–74. https://doi.org/10.1016/j.geomorph.2014.08.007.